Extending Task Parallelism For Frequent Pattern Mining
نویسندگان
چکیده
Algorithms for frequent pattern mining, a popular informatics application, have unique requirements that are not met by any of the existing parallel tools. In particular, such applications operate on extremely large data sets and have irregular memory access patterns. For efficient parallelization of such applications, it is necessary to support dynamic load balancing along with scheduling mechanisms that allow users to exploit data locality. Given these requirements, task parallelism is the most promising of the available parallel programming models. However, existing solutions for task parallelism schedule tasks implicitly and hence, custom scheduling policies that can exploit data locality cannot be easily employed. In this paper we demonstrate and characterize the speedup obtained in a frequent pattern mining application using a custom clustered scheduling policy in place of the popular Cilk-style policy. We present PFunc, a novel task parallel library whose customizable task scheduling and task priorities facilitated the implementation of our clustered scheduling policy.
منابع مشابه
Exploiting Parallelism in Association Rule Mining Algorithms
Association rule mining is one of the major technique of data mining, involves finding of frequent itemsets with minimum support and generating association rule among them with minimum confidence. The task of finding all frequent itemsets for a large datasets requires a lot of computation which can be minimized by exploiting parallelism to the sequential algorithms. In this paper, we provide th...
متن کاملData Mining: Pattern Mining as a Clique Extracting Task
One of the important tasks in solving data mining problems is finding frequent patterns in a given dataset. It allows to handle several tasks such as pattern mining, discovering association rules, clustering etc. There are several algorithms to solve this problem. In this paper we describe our task and results: a method for reordering a data matrix to give it a more informative form, problems o...
متن کاملEfficient Frequent Pattern Mining on Web Logs
Mining frequent patterns fromWeb logs is an important data mining task. Candidate-generation-and-test and pattern-growth are two representative frequent pattern mining approaches. We have conducted extensive experiments on real world Web log data to analyse the characteristics of Web logs and the behaviours of these two approaches on Web logs. To improve the performance of current algorithms on...
متن کاملSurvey on Weighted Frequent Pattern Mining
Data mining is the collection of techniques for the resourceful, automatic discovery of previously unknown, suitable, novel, helpful and understandable patterns in large databases. Frequent pattern mining has emerged as a vital task in data mining. Frequent patterns are those that occur frequently in a data set. In traditional frequent pattern mining, patterns and items within the patterns are ...
متن کاملComparative Analysis of Various Approaches Used in Frequent Pattern Mining
Frequent pattern mining has become an important data mining task and has been a focused theme in data mining research. Frequent patterns are patterns that appear in a data set frequently. Frequent pattern mining searches for recurring relationship in a given data set. Various techniques have been proposed to improve the performance of frequent pattern mining algorithms. This paper presents revi...
متن کامل